PivotalR: A Package for Machine Learning on Big Data
نویسنده
چکیده
PivotalR [1] is an R package that provides a front-end to PostgreSQL [2] and all PostgreSQL-like databases such as Pivotal Inc.'s Greenplum Database (GPDB) [3], HAWQ [4] on Hadoop. PivotalR also provides the R wrapper for MADlib [5]. MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning algorithms for structured and unstructured data. Thus PivotalR also enables the user to apply machine learning algorithms onto big data.
منابع مشابه
Debt Collection Industry: Machine Learning Approach
Businesses are increasingly interested in how big data, artificial intelligence, machine learning, and predictive analytics can be used to increase revenue, lower costs, and improve their business processes. In this paper, we describe how we have developed a data-driven machine learning method to optimize the collection process for a debt collection agency. Precisely speaking, we create a frame...
متن کاملMachine Learning and Citizen Science: Opportunities and Challenges of Human-Computer Interaction
Background and Aim: In processing large data, scientists have to perform the tedious task of analyzing hefty bulk of data. Machine learning techniques are a potential solution to this problem. In citizen science, human and artificial intelligence may be unified to facilitate this effort. Considering the ambiguities in machine performance and management of user-generated data, this paper aims to...
متن کاملA Hybrid Algorithm based on Deep Learning and Restricted Boltzmann Machine for Car Semantic Segmentation from Unmanned Aerial Vehicles (UAVs)-based Thermal Infrared Images
Nowadays, ground vehicle monitoring (GVM) is one of the areas of application in the intelligent traffic control system using image processing methods. In this context, the use of unmanned aerial vehicles based on thermal infrared (UAV-TIR) images is one of the optimal options for GVM due to the suitable spatial resolution, cost-effective and low volume of images. The methods that have been prop...
متن کاملClassification Algorithms for Big Data Analysis, a Map Reduce Approach
Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), whic...
متن کاملThe Gamma Operator for Big Data Summarization on an Array DBMS
SciDB is a parallel array DBMS that provides multidimensional arrays, a query language and basic ACID properties. In this paper, we introduce a summarization matrix operator that computes sufficient statistics in one pass and in parallel on an array DBMS. Such sufficient statistics benefit a big family of statistical and machine learning models, including PCA, linear regression and variable sel...
متن کامل